Compiler and Runtime Support for Shared Memory Parallelization of Data Mining Algorithms
نویسندگان
چکیده
Data mining techniques focus on finding novel and useful patterns or models from large datasets. Because of the volume of the data to be analyzed, the amount of computation involved, and the need for rapid or even interactive analysis, data mining applications require the use of parallel machines. We have been developing compiler and runtime support for developing scalable implementations of data mining algorithms. Our work encompasses shared memory parallelization, distributed memory parallelization, and optimizations for processing disk-resident datasets. In this paper, we focus on compiler and runtime support for shared memory parallelization of data mining algorithms. We have developed a set of parallelization techniques that apply across algorithms for a variety of mining tasks. We describe the interface of the middleware where these techniques are implemented. Then, we present compiler techniques for translating data parallel code to the middleware specification. Finally, we present a brief evaluation of our compiler using apriori association mining and k-means clustering.
منابع مشابه
Design and Evaluation of a High-Level Interface for Data Mining
This paper presents a case study in developing an application class specific high-level interface for shared memory parallel programming. The application class we focus on is data mining. With the availability of large datasets in areas like bioinformatics, medical informatics, scientific data analysis, financial analysis, telecommunications, retailing, and marketing, data mining tasks have bec...
متن کاملCompiler and Middleware Support for Scalable Data Mining
High performance data mining is emerging as an important class of parallel applications. The expertise and eeort currently required in implementing, maintaining, and performance tuning a parallel data mining application is currently an impediment in the wide use of parallel computers for data mining. We have developed a data parallel dialect of Java that can be used for expressing common data m...
متن کاملCompiling Sequential Programs for Speculative Parallelism
We present a runtime system and a parallelizing compiler for exploiting speculative parallelism in sequential programs. In speculative executions, the computation consists of tasks which may start before their data or control dependencies are resolved; dependency violation is detected and corrected at runtime. Our runtime system provides a shared memory abstraction and ensures that shared acces...
متن کاملShared memory multiprocessor support for functional array processing in SAC
Classical application domains of parallel computing are dominated by processing large arrays of numerical data. Whereas most functional languages focus on lists and trees rather than on arrays, SaC is tailor-made in design and in implementation for efficient high-level array processing. Advanced compiler optimizations yield performance levels that are often competitive with low-level imperative...
متن کاملTime Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences
ÐThis paper presents a time stamp algorithm for runtime parallelization of general DOACROSS loops that have indirect access patterns. The algorithm follows the INSPECTOR/EXECUTOR scheme and exploits parallelism at a fine-grained memory reference level. It features a parallel inspector and improves upon previous algorithms of the same generality by exploiting parallelism among consecutive reads ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002